Basic Statistics with Excel
1. Descriptive Statistics
Use
Averages to summarize your data
Suppose
you’ve done your experiment and you have data on 400 people. Are you going to
make a graph with 400 sets of lines on it? Are you going to make a table with
400 rows of data?
Of
course not! (HINT)
To practice, use the data file ‘BP Data’ which is on all the
school computers in the BI 231 folder. To get to this, click on the “Courses” folder
on the desktop. Then choose “BI231-net”
and finally “BP Data.” Choose the sheet
labeled ‘Descriptive Statistics Practice,’ which presents systolic and
diastolic blood pressure data for male and female children from 1-17 years of
age.
(Blood pressure data adapted from US
Department of Health & Human Services National Heart, Lung & Blood
Institute’s
“Fourth Report
on the Diagnosis, Evaluation, and Treatment of High Blood Pressure in Children
and Adolescents”)
What
is the average systolic blood pressure for the male children?
1.
Find the column with the male
children’s systolic blood pressure values.
2.
Click on the empty cell below the
last number in the column
3.
Click on the “fx” button next to the bar at the top of the spreadsheet
4.
A prompt box will appear.
5.
Click on “AVERAGE”
6.
If it is not on the list, type “average” in the search field and click “OK”
7.
A dialog box will appear, asking you what
cells you want to average
8.
Double check to make
sure they are the cells you want averaged, then click OK.
9.
The average will appear! You may forget what
it is if you don’t label it, so click on the box just before it and type
‘average’ to remind yourself.
When
you report an average in your paper, you should also report the number of
people you took the average over. It is abbreviated ‘n’. You would write:
“The average systolic blood pressure for male children was 115 mm Hg (n=17).”
In your data table, you would have a column for the n value.
Sex |
Number
of Subjects |
Average
Systolic Blood Pressure (mmHg) |
Male |
17 |
115 |
Female |
|
|
Now, find the average systolic blood pressure for the female children and fill
in the rest of the table above.
What
is the average systolic blood pressure for just 1 year old children? This one is a little trickier, because you
only want to average two numbers. To do it:
1.
Click on the empty cell at the end
of the row containing the data (J2)
2.
Hit the “fx” button next
to the bar at the top of the spreadsheet
3.
A prompt box will appear.
4.
Click on “AVERAGE”
5.
If it is not on the list, type “average” in the search field and click “OK”
6.
A dialog box will appear, asking you what
cells you want to average
7.
Double check to make
sure they are the cells you want averaged, then click OK.
8.
If it is not on the list, click
“more functions,” choose “statistical,” then “AVERAGE”
9.
A dialog box will appear, asking you
what cells you want to average. It will give you the wrong cells, in this case!
To select the two cells, click on the first (C2), hold down the “Ctrl” key and
click on the second cell (H2)
10.
Once again, you may forget what the
figure in this box means, so you might want to type ‘Average Systolic Blood
Pressure by Age’ in the column heading.
You
would express this in your paper by writing, “The average systolic blood
pressure for one-year-olds was 99.5 mm Hg (n=2).”
Now
calculate the average blood pressures for all ages. You can do this by clicking
on the cell with the average figure for one-year-olds. A border will appear
around the cell. Put the white cross cursor on the lower right corner of the
border, it will turn black. Hold the
mouse button down, and drag downwards to fill the column. This will copy the
equation you just created into all the other cells in the column.
Age |
Number
of Subjects |
Average
Systolic Blood Pressure (mmHg) |
Average
Diastolic Blood Pressure (mmHg) |
|
Age |
Number
of Subjects |
Average
Systolic Blood Pressure (mmHg) |
Average
Diastolic Blood Pressure (mmHg) |
1 |
2 |
99.5 |
|
|
10 |
|
|
|
2 |
|
|
|
|
11 |
|
|
|
3 |
|
|
|
|
12 |
|
|
|
4 |
|
|
|
|
13 |
|
|
|
5 |
|
|
|
|
14 |
|
|
|
6 |
|
|
|
|
15 |
|
|
|
7 |
|
|
|
|
16 |
|
|
|
8 |
|
|
|
|
17 |
|
|
|
9 |
|
|
|
|
|
|
|
|
Graph
the Average Values to make your data presentation simpler.
Create
a chart comparing the average systolic blood pressures of male to the average
systolic blood pressure of female children.
1.
This chart will only contain two
values. To select them, click on one value and then hold the ‘Ctrl’ key down and
click on the second value.
2.
When you’ve selected both cells,
click on the “Insert” tab and select the 2-D column graph.
3.
You will notice that the columns are
labeled “1 & 2” when they should be labeled Male & Female. To change this right click on the graph and
choose “Select Data.”
4.
Click on “Edit” for the horizontal
axis labels.
5.
You can then go to your spreadsheet,
hold down the “Ctrl” key and click on “Male” and “Female” and click OK.
6.
To add a title and axis labels click
on the ‘Design’ tab and choose “Chart Layouts.” Choose a layout that
includes a title and a label for each axis.
Now you can edit the title until it says what you want it to. When you title your graph, it should
specifically state what can be concluded from your graph.
7.
You’ll see a big problem with this
chart – it only shows blood pressure values starting at 112 mm Hg. Doesn’t it
look as if boys and girls are very different? Charts that do this are regarded
as dishonest, so you want to fix it right away. To do this, right click on the
‘112’ value on the chart.
8.
A box will appear, choose “Format
Axis.”
9.
For the minimum, select “Fixed” and
set at 0. Click close.
Is
the average good enough? Adding Standard Deviations
Which
average is more accurate – the average value for male children’s blood
pressure, or the average value for one-year-olds’ blood pressure? If you look
at the data, you see that the average for male children is less accurate.
Several of the children have blood pressures that are quite different from the
average. So if you tell your readers the average value, you might be misleading
them. How can you give them a better idea of the data, without making them read
the whole data set?
You
do this by giving not only the average but the STANDARD DEVIATION.
To
calculate the standard deviation, follow the same set of steps you did for the
average, but choose STDEV instead of AVERAGE.
Calculate
the standard deviation for the systolic blood pressures of male children and for
the systolic blood pressures of female children. Be careful to check which cells the computer is calculating the data for, so it
doesn’t include your averages with the data.
You’ll find that the standard deviation for the male children’s blood pressure is higher than the standard deviation for female children’s blood pressure. That means you can trust the average for females more than the average for male children.
To present these data to your readers, you would write:
“The
systolic blood pressure of male children from ages one to 17 averaged 115 mm Hg
(n = 17; S.D. = 9.76mm Hg).”
or you might
write:
“The
systolic blood pressure of male children from ages one to 17 averaged 117 mm Hg
+/- 9.76 (n=17).”
You
should give the standard deviation any time you give an average. You do this by giving the value for the average, plus or
minus the standard deviation.
Sex |
Number
of Subjects |
Average
Systolic Blood Pressure (mmHg) |
Male |
17 |
115
+/- 9.76 |
Calculate
these standard deviations and express them in table and phrase form below:
Sample |
Number
of Subjects |
Average
Systolic Blood Pressure (mmHg) |
3-year
olds |
|
|
females |
|
|
8-year-olds |
|
|
3-year-olds:
females:
8-year-olds:
You can also represent your standard deviation on your graph by adding error bars.
1.
Under “Chart Tools, Layout” click on
“Error Bars” under the “Analysis” tab.
2.
A few options will appear; select
“More Error Bar Options…” at the bottom.
3.
On the pop-up box, choose “Custom”
and click on “Specify Value.”
4.
The custom error bars box will
appear; for the “Positive Error Value” go to the spreadsheet and select both
standard deviations (you will need to hold the Ctrl key down to select both). You will do the same thing for the “Negative
Error Value.”
5.
Your graph will now contain the
error bars and give a visual representation of the standard deviation. You will notice that the two bars overlap;
this suggests that the data is not statistically significant.
2. Comparative Statistics
Statistics
aren’t just used to make your data presentation simpler. They’re used to tell
whether the difference between two sets of data is significant or not. That is,
was it a real difference or just due to chance?
To
practice with these, go to the second sheet in the BP Data file – the sheet
labeled ‘T-test practice data.’ The question you want to answer is, did
treatment A increase blood pressure, or were these
changes just due to chance?
Looking
at the raw data, you see that many of the values increased, but some went down.
What happens if you compare the average values? Calculate the average values
and standard deviations for initial blood pressure and blood pressure after
treatment A, and graph them. Don’t forget to label the averages on your
spreadsheet, and to adjust the graph axis to start at 0.
It’s
not clear whether there was a real change, is there? To find out, we’ll have to
use some more sophisticated statistics and do a T-test.
A
T-test compares two sets of values and gives you the p value -- the
probability that they are just due to chance. Ideally, you want a very low
probability, a p value less than 0.05. To do a T-test and compare these
sets of blood pressures:
1.
Click on any empty cell
2.
Hit the “fx” button
next to the bar at the top of the spreadsheet
3.
A prompt box will appear.
4.
Click on “TTEST”
5.
If it is not on the list, type “ttest” in the
search field and click “OK”
6.
A dialog box will appear. Click in the box
next to “Array 1.”
7.
Drag the dialog box out of the way
and highlight your first column of numbers.
8.
Click in the box next to “Array 2”
and highlight your second column of numbers.
These are the two sets of data you want to compare.
To
answer the ‘tails’ you need to look at the hypothesis. If your hypothesis is specific and states that
one value would be higher than the other, choose 1 tail and type the number 1
in the box. If you just predicted that they would be different, pick 2 tails
and type the number 2 in the box. The hypothesis for the data was “Treatments A & B will both lower systolic blood pressure.”
To
choose the ‘type’ of T test, you will need to determine who your subjects
were. If you used the same subject for
both the initial trial and for Treatment A, you will need to choose a “paired”
t-test and will type the number 1 in the box.
If you used a control group of subjects for the initial trial and an
experimental group of subjects for Treatment B, you will need to choose an
“unpaired” t-test and will type the number 2 in the box. These
data were before-and-after data for the same subjects, so use a paired t-test.
Now
hit ‘OK’ and see what the number is. This is your p-value, the probability that
the difference is just due to chance alone. A p-value of 0.05 or less is
generally considered statistically significant, and one of 0.05 or greater
indicates no real difference between the groups.
In
your paper, you would express these data by writing: “Although the average
systolic blood pressure increased by 8.2 mm Hg after treatment A, the
difference was not statistically significant (p= 0.111).”
Now
compare the initial systolic blood pressures with the systolic blood pressures
after treatment B, using a T-test. Does treatment B cause a statistically
significant change in blood pressure? How would you express this?
TURN
IN:
1.
A printout of your graph comparing
the average systolic blood pressures of male and female children.
2.
A sentence that tells what effect
treatment B had on systolic blood pressure, and whether it was statistically
significant or not.
3.
A graph of the average initial
systolic blood pressure, the average blood pressure after treatment A and the
average blood pressure after Treatment B.
You graph should include a specific title, labeled axes and error bars
representing the standard deviations.